テクニカルレポート | GRACEセンター

(1)

ISSN 1884-0760

GRACE TECHNICAL REPORTS

Sound and Complete Validation of Graph

Transformations

Kazuhiro Inaba

Soichiro Hidaka

Zhenjiang Hu

Hiroyuki Kato

Keisuke Nakano

GRACE-TR-2010-04 May 2010

CENTER FOR GLOBAL RESEARCH IN

(2)

(3)

Sound and Complete Validation of

Graph Transformations

Kazuhiro Inaba Soichiro Hidaka Zhenjiang Hu Hiroyuki Kato National Institute of Informatics

{kinaba,hidaka,hu,kato}@nii.ac.jp

Keisuke Nakano

The University of Electro-Communications

[email protected]

May 6th, 2010

Abstract

Transformation of graph structures is becoming more and more important in many fields such as semistructured database or model-driven software development. There, graphs are often associated with schemas that describe structural constraints on the graphs. In this paper, we present a static validation algorithm for the core fragment of a graph transformation language UnCAL [7]. Given a transformation and input/output schemas, our algorithm statically verifies that any graph satisfying the input schema is converted to a graph satisfying the output schema.

Our algorithm is enabled by reformulating the semantics of the core UnCAL, usingmonadic second-order logic (MSO). The logic-based foundation allows to express the schema satisfaction of transformations as the validity of MSO formulas over graph structures. Furthermore, with several insights on the established properties of UnCAL, the prob-lem turns out to be reducible to the validity of MSO overfinite trees, which have sound and complete decision procedure.

1 Introduction

(4)

In these applications, we often assume, for each graph transformation, that its input and output graphs are not arbitrary graphs but have some structure in it. Let us consider, say, a graph transformation that extracts a list of “person names” from a graph-formed database of an “address book”. Input graphs for such a query are assumed to have, e.g., a root node having a bunch of outgoing edges labeled person, each pointing to a node with edges name,address,phoneNo, etc. Similarly, for an input graph satisfying such structural constraints, we expect the transformation to return an out-put graph with a set of name edges. Such constraints on the structure of input/output graphs are expressed by somegraph schema language.

Sometimes, graph transformations written by programmers contain bugs that break these structural constraints imposed on the transformations. For instance, instead of generating a set ofname edges, programmer may write a transformation that produces a set of name edges each preceded by a person edge, forgetting to erase the parent edge. It is relatively easy to check such bugs dynamically; for each run of the graph transformation, we can check the conformance of the concrete input/output graphs to the specified schemas. A question arises here is, whether it is possible to ensure beforehand that such structure-breaking bugs canneverhappen? Dynamic check is not satisfactory, because it only checks the correctness for particular given input graphs.

The objective of this paper is to answer the question affirmatively, by providing astatic validation algorithm of a practical graph transformation language. The problem we would like to verify is the following one:

Validation Problem Given a transformation f, an input

schemasin, and an output schemasout, determine whether “for any graphgsatisfying sin, the output graphf(g) satisfiessout”.

More specifically, we present the validation algorithm for the core fragment of UnCAL graph algebra, which is first introduced as the basis of a graph query language UnQL for unstructured database [6] and later applied to semistructured database [7], and is recently applied to model-driven soft-ware development [15]. Our validation is sound, i.e., we are able to know statically that a validated transformation never produces ill-formed output. Furthermore, it is decidable and complete; the validation process always terminates without any false alarm.

(5)

ones (i.e., graphs whose sharing and cycles are limited to some constant dis-tance), essentially the validation problem is reduced to that on trees. Unfor-tunately, we cannot follow this approach for two reasons. Firstly, since our purpose is to validate transformation programs for more general graphs, lim-iting the input domain to almost-tree graphs does not make sense. Secondly, the original semantics of UnCAL is given in the first-order logic extended with transitive closures, whose validity is known to be undecidable even on finite trees [24].

In order to overcome the difficulty, our approach is different from the traditional tree-decomposition based method, though the spirit is a little similar as we also reduce the problem on graphs to trees. We focus on the fact that UnCAL transformations are well-structured bystructural recursion

that always quite uniformly traverses over argument graphs. The structural-recursion-based nature of UnCAL enables to derive two nice properties called

bisimulation-genericityand compactness, as shown in [7]. To put it plainly,

by exploiting these properties, we prove that if a schema-violation occurs then it must occur within the finite unfolding of the graph. Hence, for the purpose of checking schema conformance, we only need to concentrate on such finite prefixes (called finite-cuts) of graphs. Furthermore, we have found out that for the core fragment of UnCAL under consideration, we can give an alternative presentation of its semantics by using monadic second-order logic (MSO), which is known to be decidable on finite trees [21]. The core UnCAL itself is expressible enough to capture basic subgraph extractions and relabeling/restructuring along the structure of the original input graphs, and we believe that it is a good starting point for constructing a decidable validation algorithm for the full UnCAL.

(6)

Outline The paper is organized as follows. Section 2 explains the graph transformation language UnCAL and our schema language, which are the target languages of our validation technique. The subsequent two sections discuss how the validation problem on arbitrary graphs can be reduced to that on infinite trees (Section 3), and eventually to that on finite trees (Section 4). In Section 5, the validation problem is shown to be expressible in MSO over finite trees. Since the logic is known to be decidable, the validation problem is proven to be decidable at this point. Section 6 shows related work, and Section 7 concludes and presents future direction of the research.

2 Languages

In this section, we introduce two languages concerning our validation tech-nique. One is for describing graph transformations: a core fragment of UnCAL graph algebra [7]. The other is a schema language for describing structural properties of graphs.

2.1 Graph Data Model

We deal with rooted, directed, finite-branching and edge-labeled graphs whose nodes conveying no particular information. We fix the finite setLabel

of labels and the setData of data values throughout the paper. We assume a special labelε /∈Label, and denote byLabelεthe setLabel∪ {ε}. We

usu-ally write the elements ofLabel by typewriter font likea,foo, orname, and write the elements ofData as double-quoted strings like"John"or"3.14". A graph g = (V, E, r) consists of a set V of nodes (sometimes called

ver-tices), a function E from V to a finite set of edges, and a designated root

node r ∈ V. Here, an edge is a pair in the set (Labelε∪Data)×V; the

first component of each edge is the information conveyed by the edge, and the second component is the destination node of the edge. A graph without any sharing (multiple edges pointing to the same destination node) and any cycles is called atree.

Notable feature of the UnCAL’s graph model is that it has ε-edges re-semblingε-transitions of automata, which work as shortcuts between nodes. Schemas and transformations will be defined to respect this intuitive mean-ing ofε-edges. For example, the following two graphs are considered to be semantically equivalent.

• a

b•_D_D ε//•d//•

◦εrr88

εLL&&•

c

/

•e//•ε//•

≡ • d / / • ◦ a ) ) b 8 8 r r

c 55• e

/

•

(7)

Schema ::= roottype Type where Decl· · ·Decl Decl ::= Name ={Edge, . . . ,Edge}

| Name ={Edge, . . . ,Edge,∗}

Type ::= Name | Data | Type pType

Figure 1: Graph Schema LanguageGS

For instance, we do not need a union operator e1 ∪ e2 of two edge-sets explicitly, because it can be simulated by a construction {ε:e1, ε:e2} of a new node having twoε-edges, as exemplified by the root node of the figure above.

Formally, we define the setE→_{(v) of}_{outgoing edges}_{of a node}_v_{as the set}

of non-εedges reachable fromv by traversing onlyε-edges. That is, (l, u)∈ E→(v) if and only if l ̸=ε and there exists a sequence v =v0, v1, . . . , vk of

nodes with (ε, vi)∈E(vi−1) for i >0 and (l, u)∈E(vk).

2.2 Schema Language

A schema describes a restriction to the structure of graphs. For example, one can state that all the outgoing edges for the designated root node must be labeled abc, and each of the destination nodes of the edges may have edges labeledxyzgoing to the same type of nodes, and several other edges. This claim on the structure of graphs can be stated in our schema language as follows:

roottypeT where T={abc:S} S={xyz:S,∗}.

The schema language, namedGS, has the most similarity with the simulation-based graph schema proposed in [5] for UnCAL, but GS is more inclined for describing the structural properties of graphs. The difference will be discussed in detail in Section 6.

Figure 1 defines the syntax of GS, where Name is a set of type names

whose elements are written by san-serif symbols like Apple, and Data is a special type name for Data edges. We require a schema to be well-formed, i.e., every Name in a schema occurs exactly once as a left-hand side of a

Decl, and in eachDecl, there are no duplicateLabels in the right-hand side. Let us explain the idea of each construct by using the following example:

roottypeSNS where

SNS={member:Person}

Person={name:DatapName, email:Data,

friend:Person, ∗}

(8)

The schema describes structural properties of the set of graphs representing the user-network of a social networking service. According to the schema, the root node must have type SNS, that is, all outgoing edges must be labeled member and reach to nodes typable with the Person type. At this point, we only consider the case where the number of the edges is arbitrary; the extension adding cardinality constraints to schema will be discussed later as future work. For a node to have the typePerson, its outgoing edges labeled name have their destination nodes of type Data pName, meaning

that it must be typed by either one of the types Data (merely having a string representation of one’s full name) orName(storing the name in more structured way). Similarly, outgoing edges of a node of type Person with label email must have Data destination nodes, and so on. Since the type definition ends with ∗, it can also have extra edges of other labels with no constraints. Note also that the destination type offriend edges are again

Personitself; this implies that instances of the schema may contain cycles. Formally, for a schemaswritten inGS, we letrtype(s) be the root type of

s,tname(s) the set of type names appearing ins,tdecls(τ) the corresponding

body b such that the declaration τ = b is ins, and ns(ρ) ={τ1, . . . , τn} ⊆

tname(s) for a type ρ = τ1p· · ·pτ_n. The set [[s]] of graphs satisfying the

schema s consists of graphs g = (V, E, r) such that there exists a mapping m:V →2tname(s)∪{Data_}

with the following properties:

1. r ▹mrtype(s) (where v ▹mρ meansm(v)∩ns(ρ) ̸=∅ and is read as

“v has typeρ”).

2. For any node v∈V, having Data∈m(v) implies that for any (l, u)∈ E→(v) we have l∈Data.

3. For any nodev ∈ V, having τ ∈m(v) for τ ∈ tname(s) implies that E→_{(v) satisfies} _tdecl

s(τ). Here the set of edges E→(v) is defined to

satisfy a type declarationtdecls(τ) = {l1 :ρ1, . . . , ln :ρn} if and only

if for any edge (l, u) in E→(v) the label l is equal to one ofli and in

that caseu ▹mρi holds. When tdecls(τ) has a trailing star {· · · ∗}, we

require for any edge (l, u) in E→_{(v) that the label} _l _{must either be}

equal to one ofli and u ▹mρi, or be equal to none of them.

For brevity, we sometimes abuse the notation and say that v satisfies

tdecls(Data) to mean the second condition to hold for the node v. Using

the defined terminology, the validation problem can now be stated as the problem of determining the validity of the following proposition: “for any graphg,g∈[[sin]] impliesf(g)∈[[sout]]”.

2.3 Core UnCAL

The graph transformation language dealt with in this paper is, the

(9)

e ::= {l:e, . . . , l:e} node with edges

| $g variable reference

| if$l =atheneelsee conditional (a∈Label)

| &i output marker

| rec(λ($l,$g).&1:=e, . . . ,&n:=e)(e) structural recursion

l ::= $l label variable reference

| a label (a∈Labelε∪Data).

Figure 2: Core UnCAL Language

core UnCAL. The concrete syntax is shown in Figure 2. Several syntactic

restrictions further applied to the core UnCAL are explained at the end of this section, with comparison to the full UnCAL.

We hope the intuition of the most of the constructs is clear for the reader. Node construction expression {l1 : e1, . . . , ln : en} creates a fresh node v

with outgoing edgesE(v) ={(l1, r1), . . . ,(ln, rn)}where ri is the root node

of the graph obtained by evaluating the expression ei. Variable reference

and conditional branch is defined as usual. The output marker expression&i

is used only in the body ofrecexpressions as explained below. The distinct feature of UnCAL is that basically all graph manipulations are expressed in terms of one unified and powerful construct calledstructural recursion. The expression rec(λ($l,$g). &1:=e1, . . . ,&n:=en)(ea) is evaluated as follows:

first evaluateea and obtain the argument graph, and then, for every non-ε

edge (l, v) of it, evaluate each ei under the environment {$l 7→ l,$g 7→v}.

The output marker expression&j (if any) inei is connected to the root nodes

of the result graphs of the evaluation ofej at the edges inE(v).

Let us look at some examples. The following UnCAL expressiona2d xc

rec(λ($l,$g).

&1 := if$l =athen{d:&1}

else if$l =cthen{ε:&1}else{$l :&1})($db)

replaces all labels aby dand shorts edges labeledcby changing them to ε as follows:

a2d xc(◦a//•

b • c [ [ d / /

•) = ◦d//•

b • ε [ [ d / /

• ≡ ◦d//•

b • b ] ] d / / •.

Here, $db is a designated variable referring to the input graph ande(g) for any UnCAL expression eshould be read as “evaluate eunder the environ-ment $db 7→g”.

More involved example is the following UnQL expression abab

(10)

that changes all edges of even distance from the root node to a, and odd distance edges tob. You may consider the markers &i as a mutually

recur-sive call, and the expression abab to be consisting of two mutual recursive functions. One is &1, which, at each edge in the original graph, generates a new a edge pointing to the result of &2 at the original destination node. Another is &2 that generates b edges pointing to the result of &1 from its destination. The result of the whole expression is defined to be the result of the&1 at the root node of the argument graph. The following figure should be illustrative. The dotted edges denote the edges unreachable from the output root node.

abab(◦c//•d//•e//•) =

◦ a ; ; ; ; ; ; ; • a

• a ; ; ; ; ; ; ; • • b A A • b A A _• b A A • ≡ ◦a

/

•b//•a//•

Formally, the expression rec(λ($l,$g).&1:=e1, . . . ,&n:=en)(ea) is

eval-uated as follows. First, evaluate ea and obtain some graph ga = (V, E, r).

Then, generate n new nodes 1_v _from n_v _{for each node} _v _∈ _V_{, each}

corre-sponds to the marker &i. Then for each edge p = (l, u) ∈ E(v) of v ∈ V,

we evaluate each body expression ei to obtain a graph gp,i. If l = ε, we

let gp,i = ({iv,iu},{iv 7→ {(ε,iu)}},iv), i.e., ε-edges are always kept

un-changed. If l ̸= ε, evaluate ei under the environment {$l 7→ l,$g 7→

u,&1 7→ 1u, . . . ,&n 7→ nu} and get g′p,i = (V′, E′, r′). Then we let gp,i =

(Vp,i, Ep,i, rp,i) = (V′∪ {iv}, E′∪ {iv7→ {(ε, r′)}},iv), makingivthe new root

node1_{. The result graph} _g _{of the evaluation of the whole expression is the} simple aggregation g= (∪

p,iVp,i, v 7→∪p,iEp,i(v),1r) of all the graphsgp,i,

making the&1 output at the root node in the input graph as the root node of the output.

Here is another more realistic example

rec(λ($l1,$g1).&1:={member:

rec(λ($l2,$g2).&1:=

if$l2=friendthen$g2else{})($g1) })($db)

that extracts, from a graph satisfying the SNS schema, the set of members who are being friends of some other member.

The differences of the core UnCAL from the full UnCAL are threefold.

Nest-Free Core UnCAL prohibits nested rec to refer to outer variables,

e.g., for a nestedrecexpressionrec(λ($l1,$g1).· · ·rec(λ($l2,$g2).· · ·&i:=

ei· · ·)(· · ·)· · ·)(e), the inner body ei can only use variables $l2 and

$g2, not $l1 or $g1,

1_{This new root/}_ε_{-edge introduction was implicit in the preceding examples and}

(11)

Positive Core UnCAL does not have ifisEmpty($g) then predicate to check whether the graph pointed by $g is empty or not.

Simplified Markers Uses of markers&i are simplified. We require output

markers&i not to occur directly in the argument expression ea in an

expressionrec(· · ·)(ea); they can only appear in the body expressions

ofrecs (i.e.,rec(· · ·rec(· · ·)({a:&1})· · ·)(e) is not allowed due to &1 in the argument butrec(· · ·)(rec(· · · {a:&1} · · ·)(e)) is ok because it is wrapped in anotherrec). We also restrict the occurrence of input markers&i:= only at the root of the body expression of rec. Besides,

we have dropped the marker-connection operator @ of full UnCAL. In fact, the use of @ is implicit in the core UnCAL; the expression

rec(· · ·)(· · ·) in the core UnCAL should be read as&1@rec(· · ·)(· · ·) in the full UnCAL.

Note that the first and the second constraints essentially lower the expres-siveness, while the third simplification is not so, because all the UnCAL expressions compiled from its front-end language UnQL can easily be writ-ten in the form satisfying the third condition.

As a final remark, let us note one thing about the purpose of the UnCAL language. The reader may find it too primitive and not user-friendly; but this is rather intended. UnCAL is developed as the easy-to-reason-about in-ternal algebra of a more human-friendly graph-query language called UnQL, in the same sense as that the well-known relational algebra is an internal language for the SQL querying language. The validation algorithm for Un-CAL as will be presented in this paper can automatically be applied to the UnQL language, by first compiling UnQL to UnCAL and then running the validation algorithm. Roughly speaking, the restrictions of the core UnCAL correspond to the subset of UnQL queries that cannot take the join or the direct-product of multiple query results. Yet, the “core UnQL” is express-ible enough to capture basic subgraph extractions (as shown in the previous example) and relabeling/restructuring along the structure of the original input graphs.

3 From Graphs to Infinite Trees

Recall that the validity of a proposition of the form “for any graph g, a propertyφholds” referring to arbitrary graphs has no general decision pro-cedure [26] even for some first-order expressive property φ. The validation problem we want to verify–at least if literally written–is in that form: “for

any graphg, if it satisfies the input schemasin, the outputf(g) satisfies the

(12)

problem to that on infinite trees. The concept that plays the most important role here is what is called thebisimulation.

Definition 1. Graphsg1 = (V1, E1, r1) andg2 = (V2, E2, r2) are defined to

bebisimilarand writteng1 ≡g2if there exists a relation (calledbisimulation)

S ⊆V1×V2 satisfying the following conditions: (1) (r1, r2) ∈S, (2) for all (v1, v2) ∈S and (l, u1)∈E1→(v1), there exists u2 such that (l, u2) ∈E2→(v2) and (u1, u2) ∈ S, and (3) for all (v1, v2) ∈ S and (l, u2) ∈ E2→(v2), there existsu1 such that (l, u1)∈E1→(v1) and (u1, u2)∈S.

In fact, UnCAL is designed carefully to regard two graphs equal if they

are bisimilar, in the sense that two bisimilar input graphs always generate

again bisimilar output graphs. Graph transformations written in UnCAL are said to be bisimulation-generic in the sense that the following lemma holds.

Lemma 1([7], Proposition 4). For any transformationf written in UnCAL

and any graphsg1 and g2, ifg1 ≡g2 then we have f(g1)≡f(g2).

Note that the lemma holds even for infinite graphs. Regarding the known fact that any rooted graph is bisimilar to some possibly infinite tree, the bisimulation genericity can be applied to the validation problem in the fol-lowing manner. Under the assumption that schemas sin and sout and the transformationf do not distinguish bisimilar instances, the validation prob-lem is shown to be equivalent to determine the proposition: “for any tree

T that satisfies the given input schema sin, the output f(T) satisfies the output schemasout”. Thus we can reduce the problem from general graphs to trees, which is much easier. In fact, validity of MSO becomes decidable on infinite trees [21] unlike on graphs.

Before formalizing this approach, we need to check the assumption that not only UnCAL transformations but schemas are also bisimulation-generic. This is proved in the next lemma.

Lemma 2. Let s be a schema written in GS and g1 = (V1, E1, r1), g2 =

(V2, E2, r2) be graphs such that g1 ≡g2. Then, g1∈[[s]] implies g2 ∈[[s]].

Proof. Let S ⊆V1×V2 be the witness relation of the bisimilarity g1 ≡ g2

andm1 :V1 →2tname(s)∪{Data}be the type assignment that ensuresg1∈[[s]]. We can construct the type assignment m2 :V2 → 2tname(s)∪{Data_}

on g2 as follows.

m2(v′) = ∪

{m1(v)|(v, v′)∈S}

Let us show that this assignment makes the graphg2 satisfy the schema s. First, for the root node, we haver2▹m2 rtype(s), because we have m2(r2)⊇

m1(r1) due to (r1, r2)∈S, andr1▹m1rtype(s) (recall thatv ▹mρis a

(13)

of m2, there exists some v1 such that (v1, v2) ∈ S and τ ∈ m1(v1), and hence E₁→(v1) satisfies tdecls(τ). Let tdecls(τ) = {l1 :ρ1, . . . , ln :ρn} (the

other cases can be proved similarly), (l′, u′) be any edge in E₂→(v2), and u be some node satisfying (u, u′₎_∈_S _{whose existence is assured because} _S _is

a bisimulation. The label l′ must be equal to some of li’s; otherwise, there

must be an edge (l′, u) ∈ E→₁(v1), which contradicts the assumption that E→

1(v1) satisfiestdecls(τ). Furthermore, u′ satisfies the condition u′▹m2 ρi,

becauseu ▹m1ρi and m2(u

′₎_⊇_m1(u).

This bisimulation-genericity of schemas and transformations allows us to concentrate only on representatives among bisimilar graphs, instead of dealing with all kind of graphs. Let bbe a function from graphs to graphs such that g ≡b(g). Intuitively, b is a function to obtain the representative among the set of graphs bisimilar tog. Then the following lemma holds.

Lemma 3. Letb be a function from graphs to graphs such thatg≡b(g)for

anyg. Letφ and ψ be a bisimulation-generic (i.e., φ(g) =φ(g′) when g≡

g′) properties on graphs, and f be a bisimulation-generic transformation.

Then, the claim “φ(g) implies ψ(f(g))for any graphg” holds if and only if

“φ(g) implies ψ(f(g)) for any graph g in range of b”.

Proof. The ‘only if’ direction is trivial. For the ‘if’ direction, φ(g) implies

φ(b(g)) by the bisimulation-genericity ofφ. Then, sinceb(g) is in the range of b, we have ψ(f(b(g))), which implies ψ(f(g)) by bisimulation-genericity ofψ and f.

It is well-known that any rooted graph is bisimilar to an infinite tree called the unfolding of the graph. Let us formally state the property. Let g = (V, E, r) be a graph. The unfolding procedure unfold(g) is defined as (V′_{, E}′_{, r}′_{) where}

V′ ={(v, p)|v∈V, p is a path fromr tov} E′((v, p)) ={(l,(u, p.(l, u)))|(l, u)∈E(v)}

r′ = (r, ϵ).

Here a pathfrom r to v is a finite list (l1, u1)· · ·(ln, un) of edges such that

(l1, u1) ∈ E(r), (li+1, ui+1) ∈ E(ui), and un = v (if any). ϵ denotes the

empty path and.denotes concatenation. Note thatunfold(g) always yields a tree, i.e., a graph with no loops and sharings, because each invocation of

unfold creates a fresh node. The resulting tree is infinite when the original

graph contains cycles. By taking the bisimulation relationS as{(v,(v, p))| v ∈ V,(v, p) ∈ V′_} _{it is easy to see that} _g _{is bisimilar to} _unfold_{(g). Now,}

(14)

Theorem 1(Graphs to Infinite Trees). Letsinandsout be schemas written

in GS, and f be a transformation written in UnCAL. Then, the claim “for

any graphg, g∈[[sin]] implies f(g)∈[[sout]]” is equivalent to the claim “for

any possibly infinite tree T, T ∈[[sin]] implies f(T)∈[[sout]]”.

It is worth remarking that, theoretically, this theorem in addition to the MSO-definability results in Section 5 already establishes sound and complete validation.

4 Infinite Trees to Finite Trees

Infinite trees are much better domain compared to graphs in that they in fact already allows to give a decidable validation algorithm. Validity of quite a few logics (including MSO [21] that we will use later) cross over the borderline of decidability when we restrict the domain from graphs to infinite trees.

There is, however, a problem with infinite trees regarding practical ef-ficiency. As far as we know, there is no realistic implementation on the validity of MSO on infinite trees. On the other hand, for MSO on finite

trees2, there exists a good practical implementation MONA [13], whose

effi-ciency is verified in many applications. In order to implement a practically efficient validation of graph transformations, it is essential to reduce the problem further to the domain of finite trees. To this end, we show in this section that the validation problem over infinite trees can be reduced to that over finite trees.

The key idea for restricting the input domain to finite trees comes from the following observation: if an input infinite tree causes an error (i.e., gen-erates an output graph not satisfying the output schema), it must be due to some edge(s) finitely reachable from the root node of the tree. In such a case, even if we cut off the infinite continuation below the erroneous edges and make the tree finite, it should still reveal the error.

Let us formalize the notion of the “cutting off”. For trees T1 = (V1, E1, r1) and T2 = (V2, E2, r2), we define the prefix-order relation T1 ≼ T2 to hold if and only if there is a one-to-one mapping e (stands forembedding) from V1 toV2 such thate(r1) =r2 and (l, u1) ∈ E1(v1) iff (l, e(u1))∈E2(e(v1)).

Definition 2. For a possibly infinite tree T, the set of its finite-cut trees

(orfinite-cutsfor short) is cut(T) ={t|t≼T, t is a finite tree}.

2_{Here we mean by MSO on finite trees what is called weak MSO (WSkS) in the}

litera-ture. Precisely speaking, it is MSO on theinfinitek_{-ary tree domain with no}

node/edge-labels, whose second-order variables can range overfinite setsonly. The restriction on the

domain of second-order variable essentially prohibits us to encode infinitely many labeled-edges. Hence, we call it MSO on finite trees. Similarly, we mention MSO on the infinite

(15)

For instance, consider the following example of finite-cuts of a four-node tree.

cut (

•c//•

◦ a 8 8 r r

bL&&

L

• )

= {

◦ ,_◦a_r_r•88 _,

◦ b & & L L •, • ◦ a 8 8 r r

bL&&

L •, • c / / • ◦ a 8 8 r

r , •

c / / • ◦ a 8 8 r r

bL&&

L

• }

More interesting example is the finite-cuts of an infinite tree

cut(◦a//•a//•a//•

)

={◦,◦a//•,◦a//•a//•, . . .

}

that produces infinitely many finite trees.

Definition 3. A setC is said tocover T if it is a subset of cut(T) and for

any t∈cut(T) there exists tc ∈C such that t≼tc.

Intuition is,t≼t′means thatt′contains more information on the original tree t than T. When C covers T, it roughly means that C has enough information to recoverT.

The central player of this section concerning the notion of cuts is the nice property calledcompactness of core UnCAL. In the appendix of [7], it is proved that all positive UnCAL transformations are compact, i.e., instead of transforming an infinite treeT by an UnCAL transformation f, we only need to transform every finite tree of the setcut(T) in order to obtain enough information to constructf(T).

Lemma 4 ([7], Proposition 8). Let T be a possibly infinite tree and f be

a transformation written in the core UnCAL. Then, {unfold(f(t)) | t ∈

cut(T)} covers unfold(f(T)).

The lemma is proved in [7] for a use as an easy-to-use proof method for deriving several optimization laws. Here, we are to show another application of the lemma, to the validation problem of transformations.

Similar property can be proved for our schema languageGS, too. Every finite-cut of a tree T satisfy the schema which is satisfied by the original tree T, and more importantly, if all the finite-cuts ofT satisfy a schema s, then it means the original treeT also satisfies the schemas. In other words,

cut(T) contains enough information to test the schema satisfaction of T.

Lemma 5. The following properties hold for a possibly infinite tree T and

a schema swritten in GS:

1. T ∈[[s]] implies t∈[[s]] for any finite tree t∈cut(T).

2. If there exists a set C ⊆[[s]] that covers T, we have T ∈[[s]].

Proof. Let T = (V, E, r). For the first property, let us assume m to be the

witness mapping of T ∈ [[s]]. Let t = (Vt, Et, rt) be a finite cut of T and

(16)

mt as mt(vt) = m(e(vt)), we can show the schema satisfaction t ∈ [[s]].

For the root node, mt(rt) =m(e(rt)) =m(r) and hence whose intersection

with ns(rtype(s)) is nonempty. For any node vt ∈ Vt with τ ∈ mt(vt),

there cannot be any edge (l, ut) ∈ Et→(vt) violating tdecl(τ), otherwise the

outgoing edge (l, e(ut)) inE→(e(vt)) violates the declaration tdecl(τ).

For the second property, ifT is finite thenC must contain a tree isomor-phic to T itself and thus it is immediate. Consider the case T (and hence C) is infinite. We can assumeC to contain a countable chaint1 ≼t2 ≼. . . of finite trees coveringT. Without loss of generality, we can assume eachti

to have the form (Vi, E|Vi, r) withVi⊆V andE|Vi is the restriction ofE to

Vi. Let Mi to be the set of all type assignments mi :V → 2tname(s)∪{Data}

whose restrictions mi|Vi to Vi are witnesses for ti ∈ [[s]], and Mi(v) for

v ∈ V to be the set ∪

mi∈Mimi(v). Note that from the proof of the first

property of the present lemma, Mi(v) ⊇Mj(v) for any v when i≤ j, i.e.,

a type assignment for a larger cut works also for smaller cuts. Now, we construct the type assignmentm:V → 2tname₍s)∪{Data_}

as follows: m(v) = {τ |τ occurs infinitely often in the sequence M1(v), M2(v), . . .}. Let us ver-ify that the assignment ensures T ∈ [[s]]. First, let us check r ▹mrtype(s),

i.e., m(r)∩ns(rtype(s)) ̸= ∅. Suppose not, then there exists some i such

that Mi(v)∩ns(rtype(s)) = ∅, which contradicts ti ∈[[s]]. Next, let us

as-sume τ ∈ m(v) and check whether v satisfies tdecl(τ). Consider the case

when tdecl(τ) = {l1 : ρ1, . . . , ln : ρn} (other cases are similar). For any

edge (l, u) ∈ E→_{(v), the label} _l _{is either one of} _l

i’s; otherwise, for a cut tk

containing v and u, none of its assignment mk can have τ ∈ mk(u), which

contradicts the assumption that τ occurs infinitely often in the sequence. Thus, w.l.o.g. we assume l = l1. In this case, m(u)∩ns(ρ1) cannot be empty. Suppose it is empty, then none of ns(ρ1) occurs infinitely many in M1(u), M2(u), . . ., which implies the existence of sufficiently large k such that Mk(u)∩ns(ρ1) = ∅. But sinceMk(v) contains τ, this contradicts the

typing oftk.

Similarly to the previous section, by exploiting compactness of both transformations and of schemas, we can show that the validation problem of UnCAL on possibly infinite trees is reducible to that on finite trees.

Theorem 2 (Infinite Trees to Finite Trees). Let sin and sout be schemas

written in GS, and f be a transformation written in UnCAL. Then, the

claim “for any possibly infinite tree T, T ∈[[sin]] implies f(T) ∈[[sout]]” is

equivalent to the claim “for any finite treet,t∈[[sin]] impliesf(t)∈[[sout]]”.

Proof. The former claim immediately implies the latter, because finite trees

(17)

vf = {x, y, . . .} first order variables

vs = {X, Y, . . .} second order variables

tf ::= vf | root first order terms

ts ::= vs | ts∪ts | ts∩ts | ∅ second order terms

φ ::= true | false

| ¬φ | φ∨φ | φ∧φ | φ→φ | φ↔φ standard logical connectives | tf =tf | ts=ts | tf ∈ts | ts⊆ts

| ∃1_v

f.φ | ∀1vf.φ | ∃2vs.φ | ∀2vs.φ 1st and 2nd order quantifiers | vert(tf) | edgel(tf, tf, tf) graph primitives

(l∈Labelε∪ {"data"})

Figure 3: Syntax of Monadic Second-Order Logic

Lemma 4, C′ _covers _unfold_(f_(T_{)). Thus, by Lemma 5 (2), we have that}

unfold(f(T)) satisfies the schemasout. By Lemma 2, this implies thatf(T)

satisfiessout, which derives the former claim.

Whether a treeT satisfies a schemasis equivalent to whetherallthe cut trees incut(T) satisfys. Be aware that, even for a treeT notsatisfying the schema s, there may be some treet ∈cut(T) that does satisfy s(actually, there indeed exists such a tree: single-node tree ({•},• 7→ {},•) satisfies any schema and is always a cut of other trees). This way of correspondence is adequate for our purpose, because we are considering the reduction from a universal property “transformation conforms to the schemas for all infinite tree T” to another universal property.

5 Validation through Monadic Second-Order Logic

So far, we have reduced the validation problem on graphs that determine whether or not “f(g) satisfies sout for any graph g satisfying sin” to the problem on finite trees “f(t) satisfies sout for any finite tree t satisfying sin”. In this section, we show the proposition can directly be expressed as a MSO formula, whose validity is known to be decidable on finite trees [23].

(18)

defined fragment of UnCAL (i.e., the core UnCAL), and provide sound, complete, and terminating validation algorithm for the fragment, which we hope to be a solid framework towards the complete validation of full UnCAL. Section 7 discusses possible directions for enlarging the class of schemas and transformations that can be captured by the decidable logic.

Another logic worth remarking is MSO on infinite trees. Although the logic is also known to be decidable, as explained in Section 4, we prefer MSO on finite trees, emphasizing practically efficient implementation.

5.1 Review of MSO

The syntax of the formula of MSO over edge-labeled graph structure is in Figure 3. The variant of MSO we have adopted is basically that used to describe (2,2)-definable MSO transduction of Courcelle [9], with customiza-tions to adjust for our purpose, namely adding therootconstant and making edge predicatesedge_l to be labeled. For a graph g= (V, E, r) and an envi-ronment Γ that maps first-order variables toV∪Eand second-order variables to subsets of V ∪E, the judgment relation g,Γ φ is defined standardly. We present the definition on the two graph-specific primitives:

g,Γvert(t) if Γ(t)∈V g,Γedge_l(t1, t2, t3)

if Γ(t1)∈V and Γ(t2) = (l,Γ(t3))∈E(v)

where Γ is extended as Γ(root) =r, Γ(t1∪t2) = Γ(t1)∪Γ(t2), Γ(t1∩t2) = Γ(t1)∩Γ(t2), and Γ(∅) =∅, and the judgment relation for other connectives are defined standardly. We writeg φwhen g,Γ φ holds for the empty environment Γ.

One thing we have to note here is that we have single predicate

edge_"data" for edges with data-value labels, in contrast to having dis-tinct edgel predicate for each label l ∈ Labelε. In other words, we are

assuming that all data-value edges in graphs and transformations to be labeled by the same unique label "data". This is justified without loss of generality for the following two reasons. First, for schemas, changing the la-bel for data edges never affects schema satisfaction, because our schema has no way to distinguish each different data label. Second, for transformation, since we are considering the nest-free fragment of UnCAL transformations, we cannot compare two label variables and hence there are no ways to distinguish different data labels either.

5.2 Representing Schemas in MSO

(19)

Lemma 6. For any schemas, there exists an MSO formulaφssuch that for

any graph g the schema satisfaction g∈[[s]] becomes equivalent to gφs.

Proof. Let {τ1, . . . , τn} =tname(s). The concrete construction of φs is as

follows

∃2Tτ1. . . .∃

2_T

τn.∃

2_T

Data.(

root∈union(rtype(s))∧

∀1v. (vert(v)→(

(v∈Tτ1 →ψtdecls(τ1)(v))∧

(v∈Tτ2 →ψtdecls(τ2)(v))∧

.. .

(v∈Tτn →ψtdecls(τn)(v)) ∧

(v∈TτData →ψtdecls(Data)(v))

)))

where union(τ1p· · ·pτ_k) = T_τ

1 ∪. . .∪Tτk. Here, the list of second-order

variablesTτ corresponds to the type assignmentmin the definition of schema

satisfaction. Each second-order variable Tτ is meant to denote the set {v |

τ ∈ m(v)} of nodes assigned the type τ. Hence, v ∈ union(τ1p· · ·pτ_k) is

equivalent to v ∈ Tτ1 ∨ · · · ∨v ∈ Tτk and therefore it is intended to mean

τ1 ∈m(v)∨ · · · ∨τk∈m(v), or equivalentlyv ▹m(τ1p· · ·pτ_k).

The formula ψtdecl_s(τ)(v) means thatv satisfies tdecls(τ) and defined as

follows. When tdecls(τ) ={l1:ρ1, . . . , lm :ρm},ψtdecls(v) becomes

∃2O. (e out(v, O)∧ ∀e. ((e∈O∧ ¬vert(e))→ ∃1x.∃1y. ( (edge_l₁(x, e, y)∧y∈union(ρ1))) ∨

(edge_l₂(x, e, y)∧y∈union(ρ2))) ∨ ..

.

(edge_l_m(x, e, y)∧y∈union(ρm)))

)))

where e out(v, O) is a predicate for computing E→(v). It is intended to become true only when O denotes the set E→(v) of outgoing edges (plus several auxiliary nodes, which are filtered out by the subsequent¬vert(e)). It is defined by using a standard technique to represent transitive-closure in MSO as the least fixpoint

(20)

e out′(v, R)≡(v∈R) ∧

∀1x.∀1e.∀1y.((x∈R∧edge_ε(x, e, y))→y∈R) ∧ ∀1x.∀1e.∀1y.((x∈R∧edge_l′

1(x, e, y))→e∈R) ∧

.. .

∀1x.∀1e.∀1y.((x∈R∧edge_l′

p(x, e, y))→e∈R) )

with {l′₁, . . . , l′_p} = Label ∪ {"data"}. The definition of e out says that O is the least set satisfyinge out′_{(v, O) and the auxiliary relation} _{e out}′_{(v, R)}

says thatRis a fixpoint of the traversal of the graph throughε-edges. Note that, for simplicity of the formulas,R and O contain both nodes (that are reachable fromvviaε-edges) and edges (with non-εlabels, outgoing fromv or the other nodes reachable fromv via ε-edges).

Whentdecls(τ) ={l1 :ρ1, . . . , lm:ρm,∗}having the trail star, ψtdecl_s(v)

becomes

∃2O. (e out(v, O)∧ ∀e. ((e∈O∧ ¬vert(e))→ ∃1x.∃1y. ( (edge_l₁(x, e, y)∧y∈union(ρ1))) ∨

(edge_l₂(x, e, y)∧y∈union(ρ2))) ∨ ..

.

(edge_l_m(x, e, y)∧y∈union(ρm)))∨

edge_l′

1(x, e, y)∨ · · · ∨edgel′p(x, e, y)

)))

with {l′

1, . . . , l′p} = Label ∪ {"data"} \ {l1, . . . , lm}. When τ = Data, the

formulaψtdecl_s(v) becomes

∃2O. (e out(v, O)∧ ∀e. ((e∈O∧ ¬vert(e))→ ∃1x.∃1y. (

edge_"data"(x, e, y) ))),

meaning that all the outgoing edges are labeled"data".

5.3 Representing Core UnCAL in MSO

Next, we express the core UnCAL transformation by using MSO logic. We adopt the formalism for describing graph transformations in MSO intro-duced by Courcelle [9].

Definition 4. A graph-to-graph transformationf is said to be a k-copying

MSO-definable transduction if there exists a constant k and a set of

for-mulavert0(x), . . . ,vertk−1(x),edgel,i,j,m(x, y, z) forl∈Labelε∪ {"data"},

(21)

• For any pair of w ∈ V ∪E and j ∈ {0, . . . , k−1}, we have either gvertj(w) orgedgel,i,j,m(v, w, u) for at most one combination of

i, m, v, u.

• g edge_l,i,j,m(v, w, u) implies g verti(v), g ̸ vertj(w), and g

vertm(u).

• The output f(g) of the transformation is isomorphic to (V′, E′, r′) whereV′₌_{_{(v, i)}_|_v_∈_V _∪_{E, g}_vert

i(v)},E′((v, i)) ={(l,(u, m))|

w∈V ∪E, gedge_l,i,j,m(v, w, u)}, andr′ = (r,1).

Intuitively, k-copying MSO-definable transduction creates k copies of input nodes and edges, and by reorganizing them to form the output graph structure according to the supplied formulasverti(x) andedge_l,i,j,m(x, y, z). The formulaverti(x) indicates that thei-th copy ofx(which is either a node

or an edge in the input graph) becomes a node of the output graph, and

edge_l,i,j,m(x, y, z) indicates that the j-th copy of y becomes an edge from thei-th copy node ofx to them-th copy node ofz, labeled l.

MSO-definable transductions enjoy several nice properties. In particular, the following two properties are important.

Lemma 7([9], Proposition 3.2). (1) The inverse image of an MSO-definable

set of graphs under an MSO-definable transduction is MSO-definable. That

is, if f is an MSO-definable transduction and φ is an MSO formula on

graphs, then there exists an MSO formulaf−1(φ)such thatgf−1(φ)if and

only if f(g) φ. (2) The composition of two MSO-definable transductions

is MSO-definable.

The first property enables to convert MSO formulas on output graphs into that on input graphs. More specifically, instead of saying “the output graphf(g) satisfies the schemasout”, i.e., “f(g)φsout”, we can convert it

to the formula “gf−1(φsout)” on input graphs. Using this conversion, the

validation problem “for any graphg satisfying sin, the output f(g) satisfies sout” can be restated as the validity of the formula “gφsin →f

−1_(φ

sout)”

on input graphs.

Transformations in core UnCAL turn out to be realizable as MSO-definable transductions. The construction is basically to follow carefully the semantics given in Section 2.3.

Lemma 8. Any transformation f written in the core UnCAL is an

MSO-definable transduction.

To illustrate the idea, consider the following simple example:

(22)

According to the semantics, on every edgeeconnecting nodesvanduin the input graph, the body expression is evaluated and generates a new fragment of graph as follows

v //u generates

1_u

1_v ε _//•

a _:_:

t t t

bLL&&

L L

u

and these graphs are aggregated to form the whole output. Here, recall that each marker &i is represented by a newly created nodeiu, and the variable

$g is referring to the destination nodeuof the current edge. The important property of core UnCAL is that it prohibits access to outer variables, which implies that any variable expression inside a body ofrecmust point to the destination node u of the currently processed edge. Actually, this fact is crucial to make core UnCAL to be MSO-definable.

In order to represent the transformation as a MSO-definable transduc-tion, it is natural to construct each i_u _{node as the} _{i-th copy of the input}

nodeu. To represent input subgraphs embedded in the output graph likeu in the example above, we use 0-th copy of the input nodes and edges. Other components (i.e., the nodes and edges created during evaluation of the body expressions) of the output graph is constructed as the copies of the current edgee.

Thus, the example of the output shown above is represented as follows, using the notion of copying

(u,1) (v,1) (e,ε1) //(e,2)

a (e,3)

h h

3

h h

b (e,4)

V V

+

V V

(u,0).

where (x, i) denotes the i-th copy of x. From this picture, we obtain the following set of formulas representing the example transformation as a 5-copying MSO-definable transduction

vert0(x)≡vert(x)

edge_l,₀_,₀_,₀(x, e, y)≡edge_l(x, e, y)

forl∈Labelε∪ {"data"}

vert1(x)≡vert(x)

vert2(x)≡ ¬vert(x)

vert3(x)≡vert4(x)≡false

edge_ε,₁_,₁_,₂(x, y, z)≡y=z∧ ∃1u.edge_∗(x, y, u)

edgea_,₂_,₃_,₁(x, y, z)≡x=y∧ ∃

1_v._edge

∗(v, y, z)

edgeb_,₂_,₄_,₀(x, y, z)≡x=y∧ ∃1v.edge_∗(v, y, z)

(23)

where edge_∗(t1, t2, t3) is the shorthand for the formula edgel1(t1, t2, t3)∨

· · · ∨edge_l_p(t1, t2, t3) with {l1, . . . , lp} = Label ∪ {"data"}. (To be exact,

we had to take into account the semantics of structural recursion that must preserveε-edges in the input graph as-is in the output. For presentation pur-pose, we have omitted the part.) For example, the predicate edgeb,2,4,0(x, y, z) tells that there is ab-edge (y,4) from (x,2) to (z,0) if and only ifx=y and y is an edge going to z, as is illustrated in the picture. Let us repeat again here that it is essential that we do not have nested variable reference in the core UnCAL. If it were (say, suppose $g was an outer-scope variable), the z of the destination node (z,0) in the output graph need not be the destination node of the current edge in the input graph, and hence there is no way to reach it from the current edgey like∃1v.edge_∗(v, y, z).

of Lemma 8. We first show the construction how to represent each

structural-recursion defined byrecexpressions as a MSO-definable transduction. The construction is by induction on the nesting height ofrec. Let us consider a structural recursion rec(λ($l,$g). &1:=e1, . . . ,&n:=en) of nesting height

h, assuming that subrecexpressions occurring ine1, . . . , en is by induction

hypothesis MSO-definable. The base case of the induction is the caseh= 1, meaning that there are no recexpressions ine1, . . . , en.

We first convert each body expression ei to the following normal form

that has if expressions only at the top-level of the expression (except if

expressions inside the bodies of nestedrec recursions)

if$l =l1 thenspecialize(ei, l1)

else if$l =l2 thenspecialize(ei, l2)

.. .

else if$l =lp thenspecialize(ei, lp)

else&i

where{l1, . . . , lp}=Label ∪ {"data"} and specialize(ei, l) is the expression

obtained from ei by removing all if-subexpressions in a way that each if

-subexpression if$l = l′ then et else ef is recursively replaced with et if

l′ ₌_l _{and with} _e

f otherwise, and by changing all the edges {· · ·$l :e· · · }

to {· · ·l : e· · · }. Since it exhaustively checks all the labels, the final else

branch is unreachable in the standard semantics. Note that, however, by placing &i there, we get a unified treatment for the rather exceptional

ε-edge rule of the structural recursion. That is, instead of dealing with input ε-edges specially, just using the normalized body expression above even for ε-edges would realize the same result. For this reason we prefer the normal form above and do not deal withε-edges exceptionally.

Letei,l=specialize(ei, l) forl∈Label∪ {"data"}andei,ε=&i, and

(24)

We compute the MSO-representation of eachflseparately. Suppose we have

obtained thekl′-copying representation of f_l′ as the set of predicatesvertl ′

i

andedgel_l,i,j,k′ . We can combine them into a single maxl′(k_l′)-copying

trans-duction realizing the original structural recursion by a simple case-analysis formula: verti(x) =vert(x)∨∨l′((∃1v.∃1u.edgel(v, x, u))∧vertl

′

i(x)) and edge_l,i,j,k(x, y, z) = ∨′

l′(∃1v.∃1u.edgel′(v, y, u)) ∧ edgel ′

l,i,j,k(x)) for each

l, i, j, k.

Representing eachfl as a MSO-definable transduction is done just as

il-lustrated in the preceding example. The markers&i’s are represented as the

i-th copies of nodes, variable $g is represented as the 0-th copy of the des-tination node of the currently processed edge, and node/edge-construction expressions are assigned unique numbersjby, e.g., a depth-first traversal on the body expression, and constructed as the j-th copy of the edge. Nested recursion rec(· · ·)(ea) is treated as follows. By inductively processing the

argument expression, we can assumeea is represented by aka-copying

rep-resentation. Since by induction hypothesis the recursion is some k′-copying transduction, we can represent its output by ka times k′ copies, assigning

fresh copy numbersj′_{, . . . , j}′₊_k

ak′−1 (this is basically the same technique

as the composition of MSO-definable transductions of Lemma 7). Its root is represented as thej′-th copy of the root node of the representation ofea.

So far, we have shown that each rec structural recursion is a MSO-definable transduction. Showing the whole UnCAL transformation f to be MSO-definable can be done in quite the same manner. Note that f can contain the designated input variable $db, rec expression, or node-construction expression, but no markers &i, norif expressions (because no

label variable is in the scope). The variable $db is represented as the 0-th copy of the input root node, andrecand node-construction expressions are dealt as same as in fl, except that each node is constructed as the copy of

the root node of the input graph, not as the copy of the “current edge”, which does not exist here.

Wrapping up all the results presented so far, we derive the following main theorem of the paper.

Theorem 3(Sound and Complete Validation). Letsinandsoutbe schemas

written inGS, andf be a transformation written in UnCAL. We can

effec-tively determine the validation problem “for any graph g satisfying sin, the

output graphf(g) satisfies sout” of graph transformations.

Proof. By Theorems 1 and 2, the claim is equivalent to “for any finite tree

t,t ∈ [[sin]] implies f(t) ∈ [[sout]]”. By Lemmas 6, 7 and 8, it is equivalent to “t φsin → f

−1_(φ

sout) holds for any finite treet”. Since it is a validity

(25)

6 Related Work

Verification of model (graph) transformations is an important issue in soft-ware engineering. The approaches presented so far, however, are applied only to certain simple model transformations that can be easily mapped to Prolog or CSP [16, 2], or only for certain properties such as equivalence between input and output models [20]. In contrast, our verification covers a wide class of model transformations and guarantee that the model trans-formation will map schema-correct input model to a schema-correct output model.

Another group of related work on validation of transformations can be found in the area of XML processing, under the nameexact typechecking[25, 19, 17, 12]. Our novelty compared to those work is that we have dealt with graphs and shown the reduction to finite trees. After the reduction and the conversion to MSO-definable transduction, our approach to construct the inverse image f−1(φsout) of the output-schema satisfaction formula follows

the same way as those researches on XML typechecking.

The most directly related work is the simulation-based schema for Un-CAL graph model introduced by Buneman et al. [5]. Sound, complete, and decidable validation algorithm of transformations with those schemas is also given in the same paper. Compared to their schema, our schema language GS has both enhancement and shortage. Their schema is more suitable for expressing properties on data values, because their schema can have unary predicates putting constraints on Data edges (like, “it must match some regular expression pattern”), which cannot be dealt with in our framework. On the other hand, our schema is more inclined to representing structural

propertiesof graphs. For example, GShas the trailing star{. . .∗}type

dec-laration that allows existence of arbitrary edges in addition to the specified ones. Or, we have the union operatorτ1pτ₂ on types. For instance, the SNS

schema example in Section 2.2 contains a typeDatapName, meaning that the

(26)

7 Conclusion and Future Work

We have shown the novel algorithm that verify a graph transformation writ-ten in the core UnCAL is correct with respect to the specified input/output schemas describing structural property of graphs. Our algorithm is sound, complete, and decidable, in the sense that all correct transformations are always reported as correct, and all erroneous transformations are always reported as so. The technical contribution of the paper is summarized as follows:

• we have recognized and demonstrated the usefulness of bisimulation-genericity and compactness of graph transformations in the context of validation, which adds another importance of structural-recursion-based graph transformation in addition to optimization [7] or bidirec-tionalization [14],

• we have proved those two properties also for the schema languageGS; together with the first contibution, these properties allow to reduce the validation problem on graphs to that on trees without losing soundness and completeness, and

• we have given a MSO based semantics of the core UnCAL, which enabled decidable validation.

The challenge for the future is to establish the validation algorithm forfull

UnCAL. As explained in Section 2.3, the major differences between the core UnCAL and the full UnCAL are twofold. One is that nested rec al-lowed to refer to outer variables, which breaks the MSO-definability. For instance, nested UnCAL transformation can produce an output graph poly-nomially larger than an input graph, while MSO-definable transduction has only linear-size increase by definition. We are considering to address the issue by introducing more powerful formalism for describing graph transfor-mations, which still preserves the inverse MSO-definability. Note that, the essentially only property of MSO-definable transductions we really needed in Section 5 is that its inverse image of an formula is again an MSO-formula. The translation itself need not be MSO-definable! In the area of tree-transformation, such powerful yet MSO-definability-preserving for-malism are widely used for the very same purpose (such as, macro tree transducers [11] or pebble tree transducers [19]). We believe that similar technique can be devised for graph transformations.

(27)

any sign whether each empty node was indeed empty in the original tree T or it became empty due to the cut operation. A possible direction is to introduce an extended notion of cuts with richer information, e.g., leaving some special annotation to the cut-nodes so that the transformation can distinguish different kinds of empty nodes.

The other important challenge is to support richer schema languages. We are mainly interested in supporting cardinality constraints on the num-ber of edges. For example, we are planning to allow type declaration like

Person = {name[1] : Data,email[1..∗] : Data} meaning that there must be

exactly one edge labeled name, and at least one edge labeled email. Such

extension can almost subsume the standard schema language [1] used for model-driven software development. Two things must be considered here. The definition of bisimulation-genericity presented in the paper is based on

set-semanticswhere the collection of outgoing edgesE(v) is defined to be a

set of edges. In this setting, cardinality other than [0..∗] and [1..∗] are mean-ingless because duplicating edges are always unified. To sensefully introduce other cardinalities, we need to consider thoroughly the bag- or list-based se-mantics of UnCAL, which is slightly mentioned in the original paper [7] of UnCAL. More severe issue is that introducing cardinalities like [1..∗] (or whatever the one with non-zero lower bound) breaks the compactness of schemas. We need to find some way to address the issue.

References

[1] ATLAS group. KM3: Kernel MetaMetaModel manual. http://www. eclipse.org/gmt/atl/doc/.

[2] D. Bisztray and R. Heckel. Rule-level verification of business process transformations using CSP. Electronic Communications of the EASST, 6, 2007.

[3] D. Blostein, H. Fahmy, and A. Grbavec. Practical use of graph rewrit-ing. Technical report, Queen’s University, 1995.

[4] T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible markup language (XMLTM_). _{http://www.w3.org/XML/, 2000.}

[5] P. Buneman, S. Davidson, M. Fernandez, and D. Suciu. Adding struc-ture to unstrucstruc-tured data. In International Conference on Database

Theory, pages 336–350, 1997.

[6] P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu. A query lan-guage and optimization techniques for unstructured data. InACM

SIG-MOD International Conference on Management of Data, pages 505–

(28)

[7] P. Buneman, M. F. Fernandez, and D. Suciu. UnQL: a query lan-guage and algebra for semistructured data based on structural recur-sion. VLDB Journal: Very Large Data Bases, 9(1):76–110, 2000.

[8] J. Clark and M. Murata. RELAX NG specification. http://www. relaxng.org/, 2001.

[9] B. Courcelle. Monadic second-order definable graph transductions: A survey. Theoretical Computer Science, 126(1):53–75, 1994.

[10] K. Ehrig, E. Guerra, J. de Lara, L. Lengyel, T. Levendovszky, U. Prange, G. Taentzer, D. Varr´o, and S. Varr´o-Gyapay. Model trans-formation by graph transtrans-formation: A comparative study. In Model

Transformations in Practice, 2005.

[11] J. Engelfriet and H. Vogler. Macro tree transducers. Journal of

Com-puter and System Sciences, 31:71–146, 1985.

[12] A. Frisch and H. Hosoya. Towards practical typechecking for macro tree transducers. In Database Programming Languages, pages 246–260, 2007.

[13] J. G. Henriksen, J. Jensen, M. Jørgensen, N. Klarlund, R. Paige, T. Rauhe, and A. Sandholm. Mona: Monadic second-order logic in practice. In Tools and Algorithms for the Construction and Analysis of

Systems, pages 89–110, 1995.

[14] S. Hidaka, Z. Hu, K. Inaba, H. Kato, K. Matsuda, and K. Nakano. Bidirectionalizing structural recursion on graphs. Technical Report GRACE-TR09-03, GRACE Center, National Institute of Informatics, Aug. 2009.

[15] S. Hidaka, Z. Hu, H. Kato, and K. Nakano. Towards a compositional approach to model transformation for software development. In ACM

Symposium on Applied Computing, pages 468–475, 2009.

[16] G. Karsai and A. Narayanan. Towards verification of model transfor-mations via goal-directed certification. In Model-Driven Development

of Reliable Automotive Services, pages 67–83. Springer-Verlag, 2008.

[17] S. Maneth, T. Perst, and H. Seidl. Exact XML type checking in poly-nomial time. In International Conference on Database Theory, pages 254–268, 2007.

[18] T. Mens, S. Demeyer, and D. Janssens. Formalising behaviour preserv-ing program transformations. In International Conference on Graph

(29)

[19] T. Milo, D. Suciu, and V. Vianu. Typechecking for XML transformers.

Journal of Computer and System Sciences, 66:66–97, 2003.

[20] A. Narayanan and G. Karsai. Towards verifying model transformations.

Electronic Notes in Theoretical Computer Science, 211:191–200, 2008.

[21] M. O. Rabin. Decidability of second-order theories and automata on infinite trees. Transactions of American Mathematical Society, 141:1– 35, 1969.

[22] N. Robertson and P. D. Seymour. Graph minors. II. algorithmic aspects of tree-width. Journal of Algorithms, 7:309–322, 1986.

[23] J. W. Thatcher and J. B. Wright. Generalized finite automata the-ory with an application to a decision problem of second-order logic.

Mathematical Systems Theory, 2:57–811, 1968.

[24] H.-J. Tiede and S. Kepser. Monadic second-order logic and transitive closure logics over trees. InWorkshop on Logic, Language, Information

and Computation, pages 189–199, 2006.

[25] A. Tozawa. Towards static type checking for XSLT. InACM Symposium

on Document Engineering, pages 18–27, 2001.

[26] B. A. Trakhtenbrot. Impossibility of an algorithm for the decision prob-lem for finite classes. Doklady Akademiia Nauk SSSR, 70:569–572, 1950.